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Abstract 



In this paper, we study a simple correlation-based strategy for estimating the un- 
known delay and amplitude of a signal based on a small number of noisy, randomly 
chosen frequency-domain samples. We model the output of this "compressive matched 
filter" as a random process whose mean equals the scaled, shifted autocorrelation func- 
tion of the template signal. Using tools from the theory of empirical processes, we prove 
that the expected maximum deviation of this process from its mean decreases sharply 
as the number of measurements increases, and we also derive a probabilistic tail bound 
on the maximum deviation. Putting all of this together, we bound the minimum num- 
ber of measurements required to guarantee that the empirical maximum of this random 
process occurs sufficiently close to the true peak of its mean function. We conclude that 
for broad classes of signals, this compressive matched filter will successfully estimate 
the unknown delay (with high probability, and within a prescribed tolerance) using 
a number of random frequency-domain samples that scales inversely with the signal- 
to-noise ratio and only logarithmically in the observation bandwidth and the possible 
range of delays. 



1 Introduction 



1.1 Random Sampling and Compressive Signal Processing 



Over the last few decades, the development of cheap, flexible, and powerful digital signal 
processing (DSP) architectures has enabled the acquisition and analysis of increasingly rich 
data sets. One of the key principles behind the DSP revolution is the fundamental work by 
Nyquist, Whittaker, and Shannon in characterizing the minimum number of discrete-time 
samples required to fully capture the information in a bandlimited continuous-time signal. 
Unfortunately, many real-world signals of interest may have very high bandwidth, which 



can severely complicate the practical task of sampling a signal at its Nyquist rate 21 35l. 
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The recently developed theory of Compressive Sensing (CS) [5]|9] suggests that if a signal is 
structured, then we can acquire it by taking samples well below its Nyquist rate. CS relies 
on two fundamental principles: first, that many signals have much lower complexity than is 
suggested by their bandwidth (typically this is embodied in a sparse representation for the 
signal within some basis), and second, that such signals may safely be sampled below their 
Nyquist rate if the traditional uniform time-domain sampling procedure is replaced with a 
generalized linear measurement operator (typically this operator contains some degree of 
randomness). 

The CS theory has benefited from several powerful and elegant tools for probabilistic analysis 
relating to the theory of empirical processes. The essential condition (the restricted isometry 
property [4]) that guarantees sparse recovery from observations through a random matrix 
can be recast as a bound on a random process — this formulation, first put forth in 34 , is 
particularly useful when the compressive measurement system is structured [2§30][32][33||35 



In these works, the Dudley inequality 12 , a classical tool which relates the supremum of a 
random process to the geometry of its index set, is used to bound the expected supremum 
of the process, and strong tail bounds are established that control the deviation from the 
average behavior. To date, almost all of the work along these lines has focused on providing 
guarantees for signal recovery from compressive measurements. 

There are many applications, however, where we are not interested in a full-scale recovery of 
a signal. Instead, we may wish only to estimate some key parameters (or "features") in order 
to solve an inference problem that does not demand full knowledge of the signal. It has 
been demonstrated that random measurements can again be very useful in such settings. 
Just as certain low-complexity signals can be fully recovered from random measurements, 
certain low-complexity questions can be answered about (possibly arbitrary) signals directly 
from random measurements without first recovering the signal. Some initial steps in this 
direction have been concerned with compressive detection, classification, estimation, and 
filtering [6j[7j[ll][T6j[T7]. Compared to alternative techniques that base their inference on a 
full set of Nyquist samples, compressive inference techniques can show slightly diminished 
accuracy because fewer statistics are measured concerning the signal. In exchange, the 
acquisition hardware can potentially be much simpler and consume less power. In addition, 
we maintain the ability to adapt to future information we may learn the problem at hand; 
from a single set of random measurements, a number of different inference problems may 
be solved concerning a number of possible candidate signals. 

In this paper, we study the problem of matched filtering (i.e., estimating the unknown delay 
and amplitude of a known template signal) from the compressive viewpoint. In particular, 
we derive strong bounds on the performance of a compressive matched filter by bringing in 
some of the same probabilistic tools that have been so fruitful in CS recovery analysis. To do 
this, we show that the compressive matched filtering problem can be reduced to controlling 
the supremum of a certain random process, which we approach through a specialized version 
of the Dudley inequality. 
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1.2 Matched Filtering from Limited Frequency Samples 



1.2.1 The Compressive Matched Filter 

The problem that we consider is formally stated as follows. Suppose we make observations of 
the continuous-time signal A-so{t — To), where So(i) is a known signal template, tq G T C M 
is the unknown delay (the time- of- arrival), and A E M or C is the unknown amplitude. 
Given these observations, we would like to estimate tq and A. 

The optimal solution to this problem is given by the matched filter. All shifts of the known 
template So{t) are correlated against the incoming signal, and the estimated time-of-arrival 
is the shift that yields the maximum correlation. The matched filter is typically implemented 
in one of two ways: either with a specialized analog circuit that performs the correlation and 
then detects the peaks, or by sampling the signal and calculating the correlation function 
digitally. The advantage of the digital approach is the flexibility it offers; we can perform 
matched filtering against many different waveforms from the same set of samples, in case 
So(t) is not known in advance (but belongs to a collection of candidates). If the signal So(t) is 
concentrated in time, the sampling rate must be commensurately high to accurately estimate 
Tq. Applications such as high-frequency radar or ultra- wideband communications can require 
sampling rates of hundred of millions, or even billions, of samples per second. Taking and 
processing samples at these kinds of rates is costly in terms of hardware complexity and 
power consumption. 

Working from compressed samples gives us a more elegant solution. In this paper we 
analyze a simple correlation-based estimator that operates using a small number m of noisy 
samples of spectrum of the received signal, with locations that are drawn randomly from a 
uniform distribution on some interval Q in the frequency domain. In one of our main results 
(Corollary [g]) , we prove that for broad classes of signals, this compressive matched filter will 
successfully estimate tq (with high probability, and within a prescribed tolerance) using a 
number of random frequency-domain samples that scales inversely with the signal-to-noise 
ratio and only logarithmically in the observation bandwidth \^}\ and the possible range of 
delays \T\. Our results help validate the use of compressive measurements for capturing 
important signal information. This acquisition scheme also offers us flexibility in that it 
depends only on very broad characteristics of the signal; it is universally effective for all 
So(t) which are spread out over the band Q. 

We note that the use of randomized measurements in the frequency domain is not unprece- 
dented. One of the original motivating problems for CS, for example, came from magnetic 
resonance imaging (MRI), where the goal is to reconstruct an image from a partial set of 
Fourier coefficients |5]|25]. Randomized frequency-domain measurements are also standard 
in CS problems where the signal to be recovered is sparse in the time domain [5||34], and 
of course, in the compressive matched filter problem, the unknown signal delay is mani- 
fested in the time domain. In hardware, the requisite spectral samples for the compressive 
matched filter could be acquired by correlating the incoming signal with a bank of oscillators 
tuned to random frequencies, or by following a Fourier transforming device (such as a SAW 
processor 18 1) with a random sampler. Although the analysis in our paper is limited to 
one-dimensional signals, one could also envision formulating the matched filtering problem 
for two-dimensional images, and random samples of a two-dimensional spectrum could be 
acquired by combining the Fourier transforming property of a lens with a random sampler. 
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1.2.2 Analytical Framework and Summary of Main Results 



In this paper, we develop an analytical framework for studying the compressive matched fil- 
ter based on tools from probability theory and empirical processes. To help build intuition, 



in Sections 2.1 through |2.4|we first study the problem fully in the case of noiseless measure- 



ments. In Section 2.5 we then extend all of our analysis in parallel fashion to account for 
measurement noise. Section [4] and several appendices provide supporting proofs for all of 
our main results. 

For both the noiseless and noisy problem formulations, we begin by showing that the output 
of the correlation-based estimator can be modeled as a random process whose mean equals 
the scaled, shifted autocorrelation function of the template signal. Noting that the scaled, 
shifted autocorrelation function of the template signal (the mean of this random process) 
peaks at tq, we estimate tq by finding the empirical maximum of the random process, and 
we give guarantees about the accuracy of this estimate by showing that the random process 
does not vary too much from its mean. Given the estimate of the delay, an estimate of the 
amplitude A follows easily via least-squares, just as with the standard matched filer. 

We approach the analysis as follows. In Theorems [T] and [4] we adapt the proof of the 
Dudley inequality to show that the expected maximum deviation of this random process 
from its mean decreases sharply as the number of measurements increases. A bit more 
formally. Theorem [T] states that in the noiseless case, the expected maximum deviation of 
this process from its mean decreases roughly like m'^/^ (normalized by the peak value of the 
mean function). Theorem |4] quantifies the amount of additional deviation one would expect 
based on noise in the observations. In Theorems [2] and [5] we then derive a probabilistic tail 
bound on the maximum deviation of this process from its mean. Specifically, Theorem [2] 
guarantees that with high probability the noiseless process stays uniformly close to its mean, 
and Theorem [5] guarantees that with high probability the maximum additional deviation 
caused by noise is also bounded. Finally, in Corollaries [3] and |6] we pull these results 
together to establish bounds on the number of measurements required to guarantee that 
the empirical maximum of this random process occurs sufficiently close to the true peak 
of its mean function. Specifically, Corollary [3] ensures in the noiseless case that when the 
template signal has an autocorrelation function with a single prominent peak, no values of 
T far from tq can yield an estimate close to the true peak. Corollary [6] extends this result 
to account for noise and leads to the central result: the compressive matched filter will 
successfully estimate tq from m random frequency-domain samples (with high probability, 
and within a prescribed tolerance) as long as m scales inversely with the signal-to-noise 
ratio and logarithmically in the observation bandwidth and the possible range of delays 

|r|. 

All of our bounds depend on the degree to which the template signal sq is concentrated in 
the frequency domain. As might be expected given the uniform random sampling strategy 
on 17, signals whose spectrum is relatively flat across will require the fewest measurements, 
while signals with highly peaked spectra will require the most. These issues are carefully 
quantified and discussed in detail throughout Section [2] 
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1.2.3 Exchanging Time and Frequency 



It is important to point out that the roles of time and frequency are completely interchange- 
able in our analysis. All of our results from Section |2] can therefore be adapted to the "dual" 
problem of estimating the unknown carrier frequency of a modulated signal given a small 
number of time-domain samples of that signal; the time domain becomes our observation 
domain, and the frequency domain becomes the domain in which we wish to determine the 
unknown shift of the known template signal. For the sake of space, we do not restate all of 
our results in this context, although the conclusion is clear: a compressive matched filter can 
successfully estimate an unknown modulation frequency loq from m random time-domain 
samples (with high probability, and within a prescribed tolerance) as long as m scales in- 
versely with the signal-to-noise ratio and logarithmically in the observation duration \T\ 
and the possible range of carrier frequencies \^}\. (A Nyquist-based approach, in contrast, 
would require a sampling rate linearly proportional to but could tolerate somewhat 
lower signal-to-noise ratios.) The bounds will also depend — in this case — on the degree 
to which the template signal is concentrated in the time domain. Signals whose envelope 
is relatively flat across T will require the fewest measurements, while signals with highly 
peaked envelopes will require the most. 

While we do not discuss this problem further in full generality, we do briefly examine a special 
case, namely the problem of estimating the frequency of a pure sinusoidal tone from noisy 
time-domain samples. Such a problem is an ideal candidate for the compressive matched 
filter because a pure sinusoidal tone has a perfectly flat envelope in the time domain. We 
discuss this tone estimation problem in Section |3| and carefully quantify (in Corollaries [7] 
and [s]) the number of random time-domain samples required to successfully estimate the 
tone's frequency. We also address an important practical question: at how many points is it 
necessary to sample (or query) the random process when searching for its peak? Using an 
adaptation of Corollary |3] we show for the noiseless case that the empirical peak from a finite 
set of samples of the random process (with sufficiently dense sampling) must occur within 
a certain distance of the true peak of the continuous random process. One can therefore 
employ a grid search strategy for implementing the compressive matched filter, and from the 
empirical maximum on this grid, one can actually employ a local concave ascent to find the 
exact value for uq. We close Section [3] with a stylized application illustrating the potential 
of extending this work to the problem of determining the arrival time of a linear chirp. 



1.3 Related Work 

To the best of our knowledge, our framework for studying the compressive matched filter is 
novel. Prior statistical analysis for compressive inference problems has focused specifically 
on problems of signal detection or classification from a finite model set | llp6p7 or employed 



a geometric point-of-view based on a stable embedding of signal family from an original 
finite-dimensional signal space into a lower-dimensional measurement space [6][^ . Our work 
takes a substantially different approach, considering the inference of a continuous-valued 
shift parameter from a continuous-time received signal, and more thoroughly characterizing 
the statistics of the problem using the language and tools of empirical processes. 

As mentioned above, similar probabilistic tools have been employed in CS, but for the 



analysis of the sparse signal recovery problem 28 30 32 -34 . While in principle one could 
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view the matched filter problem as that of recovering a 1-sparse signal from a dictionary 
{so(i — r) : r G T} of possible candidates, such a dictionary would have infinite size and 
extremely high coherence, preventing the application of most standard recovery analysis 
techniques. One recent work [14] has formalized the matched filter problem using signal 
recovery principles and a union of subspaces model. However, this work is quite different 
from ours in that it does not theoretically study noise sensitivity and relies on a non- 
random sampling architecture that is carefully designed to facilitate the solution of the 
recovery problem. Interestingly, outside the field of CS, very similar random processes to 
those that we study have also arisen in the analysis of the spectral norm of random Toeplitz 
matrices 27 



The second part of this paper adapts our analysis of the compressive matched filter to 
the problem of estimating the frequency of a pure sinusoidal tone from a small number 
of random time-domain samples. The recovery of signals that are sparse in the frequency 
domain based on compressive measurements is a problem that has been well-studied in the 
CS literature, although most work in this area has been concerned with signals that can be 
written as trigonometric polynomials [5][T5][20j|35] . Some techniques for recovering off-grid 
frequency-sparse signals have been proposed that involve windowing |35| or other classical 
techniques from the field of spectral estimation |10| , and other work has considered the 
more general problem of recovering continuous-time signals based on a union of subspaces 
model |13| , but the analysis that we present is more sharply focused on the statistics of the 
simpler pure tone estimation problem. 

Finally, we would like to point out some of the differences between the tone estimation 
problem considered in this paper and the classical problem of estimating the power spectrum 
of a random process from samples at random locations (see [2]|3]|24|[26]). In Sections |2] 
and |3j we will show how the output of the compressive matched filter is a random process 
whose mean is the template autocorrelation function. This random process is completely 
specified by the samples we have observed, and rather than merely estimating its second- 
order statistics, we will be interested in establishing a uniform bound on its deviation from 
the template; this will allow us to conclude that it peaks at or near the correct location. It 
is also worth mentioning that our work differs from Rife and Boorstyn's classical analysis of 



the single-tone parameter estimation problem 31 . Specifically, our work permits sampling 
below the Nyquist rate, and with high probability we provide an absolute bound on the 
accuracy of the frequency estimate, rather than involving the Cramer-Rao bound. 



2 Analytical Framework and Main Results 

2.1 Problem Statement 
2.1.1 Signal Model 

Suppose we have received a signal A- so{t — tq), where so{t) is a known signal template, and 
To and A are the unknown delay and amplitude, respectively. We assume that the unknown 
delay tq (also called the time-of-arrivat) is restricted to some interval T = [rmin, Tmax] C M. 
We make no particular assumptions about sq, although our bounds will depend on the 
properties of sq over the range of frequencies where it is observed. 
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We will consider two closely related cases in this paper: in the real case, we restrict both 
So and A to be real-valued, whereas in the complex case, we allow both sq and A to be 
complex- valued. Much of our analysis will be identical for the real and complex cases, but 
we will specialize our discussions to distinguish between the two cases when necessary. 



2.1.2 Observations 

We would like to estimate tq and A based on random samples of the Fourier transform of 
the received signal. In particular, we suppose that we acquire m samples of the Fourier 
transform of A ■ so{t — tq) at frequencies lji,uj2, ■■■,^m, which are drawn independently at 
random from a uniform distribution on some interval = [— Wmaxj '^max] in the frequency 
domain. Typically, one would choose 0, roughly equal to the essential bandwidth of sq, 
although this is not strictly necessary; we more carefully discuss the implications of choosing 
O in Section [2. 51 below. 

The vector of observations y £ is formed as 

/oo 
so(t-ro)e-'"**o!t = yl-e-'"'="°so(a;fc), fc = l,2,...,m, (1) 
-oo 

where So(<^) denotes the Fourier transform of So{t). For the moment, we assume that 
these observations are noiseless. In Section 12.51 we extend our formulation to account for 
measurement noise. 

We define s{t) to be a low-pass filtered version of SQ{t) having frequency content bandlimited 
to the interval fi. More formally. 

It follows that 's{(jj) = 'sq{uj) for all lj G and that 's{oj) = for all u ^ Q. Thus, because 
our Fourier-domain observations are limited to the interval fi, we may rewrite the expression 
for our observations as 

y[k] = A ■ e-''^^"°s(a;fc), A; = 1, 2, . . . , m. 

Consequently, all of our subsequent analysis will depend only on properties of the bandlim- 
ited signal s{t). 



2.1.3 Least-Squares Estimation 

Given the observation vector y, a natural approach to estimating tq and A is to find the 
delay and amplitude which best explain the measurements in a least-squares sense. (Such 
a least-squares estimate coincides with the maximum likelihood estimate in the case of 



Gaussian measurement noise, as we consider in Section 2.5 ) More formally, we define 



(tq,^) := argminV] \y[k] - A-e "^''■^s(tJfc) 1^ = argmin \\y - Ai/^rWl, (2) 



r.A ^-^ ' ' T,A 

k=l 
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where for any t G T, the test vector ipr G C"^ is given by: 

iP^[k] = e~'"^''''s{ujk), k = l,2,...,m. 

For a given estimate tq of the delay, one can derive a closed form expression for the amplitude 
A that minimizes 



A 



real case, 



2 , complex case. 

ro 112 



(3) 



Plugging (|3]) in to ([2]), we see that the optimal time-of-arrival estimate is given by 

{argminT-gy ~^^]|''.j,^'||2^^''^ i i"6al case, 
argmm^-gr jj^) '|P ' complex case. 

Finally, noting that ||V't|| is constant over all t £ T, we obtain a simplified expression for 
the least-squares estimate of tq: 

_ _ f argmax^gT |Re(y,-0T)| , real case, 

I arg max,-gT | (y, il^^-) \ , complex case. 



Equation Q suggests a correlation-based strategy for estimating tq; this strategy is a natural 
generalization of the traditional time-domain "matched filter" to our measurement setting. 
For this reason, we refer to such an estimator as a compressive matched filter, and our focus 
in this paper will be on the accuracy with which tq can be estimated using such an estimator. 
Because A is subsequently defined in terms of tq, one could easily extend our analysis to 
bound the accuracy of estimating A. 



2.2 Noiseless Analysis 

In order to study the performance of a correlation-based estimator for tq, let us define the 
complex- valued random process X{t) on T to be the correlation of the observations y with 
each of the test vectors ipr, 

m 

X{t) := {y,^l^^)=AY,\s{uJk)\'e'^>'^-~-'^l (5) 
fc=i 

This random process has mean function 

EX(r) = AgE|?KOre-(--) = || ) = ^-^R.s{r - ro), (6) 

where Rssi') = (■s(t) * ■s*(— t)) (•) denotes the autocorrelation function of s{t). 

In the complex case, the compressive matched filter estimate Q for tq can be interpreted 
as a search for the maximizer of |X(r)|. Because is maximized at the origin, one 

would expect informally that, on average, finding the maximum magnitude of the process 
X{t) should correctly estimate tq. In the real case, the compressive matched filter estimate 
for To can be interpreted as a search for the maximizer of |Re(X(r))|. However, in this case 
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we note that since Rss{-) is real, we will have E Re(X(T)) = Re(EX(r)) = EX(r), which 
again has magnitude maximized at tq, and so informally, finding the maximum magnitude 
of the process Re(X(r)) should correctly estimate tq. 

An equivalent, and potentially more revealing, way to frame the delay estimation problem 
is to observe that we could rescale X{t) to obtain an estimate of the ideal autocorrelation 
function Rgsi') (up to the unknown amplitude A and translation tq). Define 



1^1 

2iTm 



Re(X(r)), real case 



Rssir):={ , (7) 



M 

2iTm 



X[t), complex case. 



One way to interpret this estimate is that we have approximated the scaled, shifted autocor- 
relation function, ARss(t — tq), as a discrete sum with samples taken at random locations 
in il; equation ([g]) tell us that this estimate is unbiased, since E[i?s5(r)] = ARss{t — tq). 

It is clear that solving the least-squares problem (Q is equivalent to finding the maximum 
of \Rss(t)\. Our main concern will be quantifying how close the random process Rss{t) is to 
its mean. It is worth noting that, if the measurements are perfectly clean and we are able to 
perform all computations to infinite precision, then |i25s(r)| is actually guaranteed to peak 
at Tq, where it takes its maximum value of |f2|(|A|27rm)~^||y||2. But what Theorems [l] and 
[2] will tell us is that if m is large enough, then there will be a tangible gap between this 
peak at tq and the valu es o f |iiss(r)| for all r bounded some distance away from tq. As we 

this gap will make the maximizer of \Rss\ a robust estimate of 



2.5 



will then see in Section 
To in the presence of noise. 

To simplify some of the notation, we will use rj to denote the peak magnitude of the mean 
function ARss{t — tq), 

l|-?l|2 

7j=\ARssm = \A\ \\sg = \A 



27r 



Our results will depend on how concentrated the Fourier transform s(a;) is over the sampling 
domain fl. Intuitively, if s' is spread out more or less evenly over Q, then each sample will 
give us some information about the return signal. If s" is concentrated on a small set within 
O, then only a small number of the randomly chosen samples will tell us anything at all. 
We will quantify this concentration in two different ways. We introduce 



/ii = [pqTn , and /i2 



If the energy of sis equally spread over the sampling domain J7, that is if |s(a;)| = |r2|~"'^/2||5||2 
for all CO € i}, then it is easy to see that /Ui = /X2 = 1. If most of the energy in 5(0;) is 
concentrated on a small subset of fi, then /ii and ^2 will be large (and in fact, they can be 
made arbitrarily large). 

We start by getting a rough idea of how close Rg^ (r) is to its mean by looking at the variance 
at a shift r. Since |Re(X(r))| < |X(r)|, we can bound the second moment of Rss{t) in both 
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the real and complex cases: 



E 



\RsAt) 



< 



E\X(t) 



\A\^\n\^ 



rri m 



fcl=l k2 = l 

m 



fci=i 



\AW " 



+ 



4^2^2 Z-,' Z-,' |J7|2 

sill + l«-(^ - ^o)|' 



|A| \Kss[r - Tojl , 



47r2 



+ 



m 



m 



whereas |E[i?,,(r)]|2 = \A\^\Rss{t - tq)]'^. Therefore, 



Var 



RUr) =E |i?..(r)|2 - E[i?3,(r)] 



m 



Using Jensen's inequahty, we then obtain a bound for the expected deviation of Rss{t) from 
its mean at a fixed shift r: 



E 



Rss{t) - ARss{t - To] 



< 



m 



(8) 



As expected, this deviation gets smaller as m increases, and scales with fii. 

Our first theorem gives us a uniform bound for the expected maximum deviation of Rss{t) 



from its mean over all t ^ T. The following result is proved in Section 4.1 



Theorem 1. Suppose that \n\\T\ > 3^ Then the autocorrelation function estimate Rss{t), 



as 



defined in obeys 



E sup 

rST 



Rssir) - ARss{t - To] 



< 



m 



4.25A/log(2|n||r|) + 2.28 



< 5.96 • ^ • Vlog(2|f^||r|). 

/m 



(9) 



The essential difference between ([s]) and ^ is the factor of -y/log(2|r2||T|) — this is the 
price we are paying for a bound which holds uniformly over all r G T. The bound slowly 



^If IfillTl < 3, this theorem and all of our bounds still hold but with weaker constants. 
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loosens as the time-bandwidth product |T||r2| gets larger; this effect is weak but necessary, 
as |T||il| affects the complexity of the random process. (A similar penalty arises in stan- 
dard CS bounds [5]|9], where the number of measurements required for successful, robust 
signal recovery grows logarithmically with the ambient dimension of the signal space — this 
logarithmic dependence is known to be sharp.) 

We note that Theorem [T] could be proved using the Dudley inequality |12] , a classical tool 
which relates the supremum of a random process to the geometry of its index set; the main 
challenges arise in computing covering numbers for the index set T under certain metrics 
defined in terms of the random process Rss{t) and in adapting the Dudley argument to 
complex numbers. To provide better insight and to obtain sharper constants, however, our 
proof in Section |4.1| more directly customizes the derivation of the Dudley inequality for 
our particular scenario. We also note that a simple application of the Sudakov minoration 
principle |23| (after computing the necessary metrics) reveals that the bound in Theorem [T] 
is indeed sharp (up to a constant). Intuitively, |r2||r| is the number of points on a grid of 
resolution l/|f^| on T necessary to control the deviations of the random process. 

Theorem [l] demonstrates that Rss{t) is close to its mean in expectation; our second the- 
orem demonstrates that it is also close with high probability. The following is proved in 
Section liTTl 



Theorem 2. Fix S > and let 

?7 = Ci-maxf^, ^ • V^ogi4/6) ] ■ Vlogil2\n\\T\/6), (10) 



where Ci is a known universal constant. If \T\ > 3, then the autocorrelation function 
estimate Rss{t), as defined in obeys 

snp \Rss(,t) - ARss{t - To) >u\ < 6. (11) 



2.3 Example: A Gaussian Pulse 



A concrete example will help illustrate what Theorems [T] and |2] are telling us about the 
effectiveness of the compressive matched filter. Suppose that so(t) is a real- valued Gaussian 
pulse with unit energy, 

,o(t) = ^-i/Vi/2e-*'/2-', (12) 

We will assume that this pulse is received with a time-of-arrival tq in the interval T = [0, 1], 
and that it is scaled by an unknown real-valued amplitude A. We will also assume that 
the width a of the pulse is much less than 1, and so to estimate tq reliably from samples in 
the time domain, we would need on the order of 1/a samples on T. Figure [T][|a) shows an 
example received signal A ■ SQ{t — tq) for A = 1, tq = 0.4 and a = 1/200. 

The Fourier transform of sq is 

We will take as our sampling domain 17 = [— 3/a, 3/a]; s is simply sq bandlimited to this 
interval. The bandlimited signal 's is nearly identical to 'so; we can calculate 

llsolls = 2vr, \\so\\l = i2TTf/^^, 
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Figure 1: (a) The return signal for Gaussian pulse from ( 12 1 with a = 1/200 and return parameters 
A — 1 and tq = 0.4. (b) The Fourier transform 's{uj) on ft = [—600, 600]. Since 's is relatively diffuse 
over fl, both measures of frequency concentration are not too large: /ii < 1.6, fi2 < 3.4. 



and standard bounds for integrating the tails of e~"'^'^^ show us that ||s~||| is within 1.4- 10~^ 
of lls'olli ll^lll is within 10~* of IIS0II4. We can safely say that /ii < 1.6 and ^2 < 3.4 
(these values are the same for all choices of a). The Fourier transform for a = 1/200 over 
the range Q = [—600,600] is shown in Figure [T][[b) . 

For a received signal with parameters A = 1, tq = 0.4, and a = 1/200, Figure |2] shows the 
estimate i?<js(r) of the scaled, shifted autocorrelation function based on m = 10, 20, and 
50 random frequency domain samples, along with the true scaled, shifted autocorrelation 
function A ■ RgsiT — tq) ~ g"^"^""^") /(^'^ \ In all cases, we see that reaches its 



peak exactly at tq; as noted in Section 2.2 this is to be expected in the case of noiseless 
measurements. However, we also see a gap between the peak at tq and the remainder of the 
estimate that becomes larger as m increases. 

To see how this behavior is supported by our theory, note that for a Gaussian pulse with 
A = 1, we know that the mean function i?ss(T — tq) = 1 for r = tq and Rss{t — tq) < 0.1054 
for |r — tqI > 3a. (For simplicity, these calculations assume s = sq exactly.) If U is the 
value from ( |10[ ) in Theorem [2j then we are guaranteed that the difference between the peak 
value Rss{tq) and any Rss{t) for [r — tq\ > 3a is at least e when 1 — U > 0.1054 + U + e, 
i.e., when U < M9|6^ Note that U can be made small enough for large enough m, namely 
m > log(l/a). Thus using the compressive matched filter, we can reliably infer the time- 
of- arrival from ~ log(l/a) randomly chosen samples in the frequency domain as opposed to 
~ 1/a equally spaced samples in the time domain. 



2.4 General Noiseless Performance Characterization 

Our statements about about quantifying the number of samples needed to ensure a clear 
separation between the peak of |i?ss(r)| and the function away from the peak are easily 
generalized. The statements in this section can be interpreted as a condition on the number 
of samples needed to ensure the successful operation of the compressive matched filter. 
The result below is interesting when the underlying autocorrelation function Rss{t) has 
one main peak (a "main lobe") centered at r = 0, and is relatively small away from the 
origin. This situation is typical, but similar statements could be formulated depending on 
the assumptions one wishes to impose on s{t) (and its autocorrelation function). 
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(a) (b) (c) 

Figure 2: Estimated scaled, shifted autocorrelation function Rss{t) (solid blue line) and true scaled, 
shifted autocorrelation function A - Rss{t — tq) (dashed red line ) for (a) m = 10, (b) m = 20, (c ) and 
m — 50 measurements. i?ss(r) is a random process whose mean is A ■ RssiT — tq); as the number of 
measurements increases, this process deviates less from its mean (see Theorems[^and^. 



Corollary 3. Suppose there exist constants ai G [0,1) and a2 > such that \Rss{t)\ < 
aiRssi^O) for all \t\ > 02, and choose e £ [0, 1 — ai]. Suppose also that \T\ > 3 and that 

/log(lWM 2 Vlogi^/S)log{l2\n\\T\/6) \ 

m> C2- max — ■ fi^, /i2 , (13) 

\^ (1 - ai - e)^ 1 - Qi - e J 

where C2 is a known universal constant. Then with probability at least 1 — 5, \Rss{to)\ > 
\Rss{t)\ + /o'^ 0,11 T such that \t — tq\ > a2- 



Proof. Supposing we have the concentration suggested by (11), we will have |i2ss(To)| > 
\A\Rss{<d) - U and \Rss{t)\ < ai\A\Rss{'d) + U for all r such that |r - ro| > 02- If ^ 
is satisfied with C2 = max(4C^, 2Ci), then U < ^\A\Rss{0){l — ai — e) and it follows that 
\A\Rss{0) - U > ai\A\RssiO) + U + e\A\Rss{0). A slightly stronger version of this corollary 
also holds if one omits C2 and chooses constants of 4Cf and 2Ci for the first and second 



terms in (13), respectively. □ 



For the case of noiseless measurements. Corollary [3] ensures that no values of r far from 
Tq can give |i?ss(''")| close to |iiss(ro)|. This behavior will become particularly relevant in 



Section 2.5, where we introduce noise into the measurement process. 



We can reveal some of the intuition behind the measurement bound (13) by considering 
three special cases for the signal s. First, consider s{t) for which |s(6t;)| is uniform over 
fl. In this case, we have |s(a;)| = |f]|~^/^||s||2 for all w G and so Hi = IJ,2 = 1- This 



means that the requisite number of random measurements (13) for successful operation of 
the compressive matched filter scales as m ~ log(|r2||r|). 

Alternatively, consider the case where |s~(f^)| is not perfectly uniform over Q, but rather 
we assume that for some /3 > 1 it obeys |s~(a;)| < /3|r2|~-^/^||s||2 for all w G 0, and so 
fi2 < Using the fact that < HsHillS^II^ gives us the estimate Hi < /3. Therefore, 



(13) now demands that m ~ /3 log(|r2||T|) — the factor of /3 is the price we pay for the 



non-uniformity of s. 

As a final example, consider the special case where s is bandlimited to some interval Qb ^ 
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and is uniform over Qb, i-e., 



so; 



[ 0, oj e ^^\0.B, 

One way we could interpret this situation is that we have chosen the samphng domain 



to be too large. In this case, we have /xi = y^K^ITP^bT A*2 = I^^I/I^^bI) and so (13) now 
demands that m ~ (|il|/|r2s|) log(|r2| |T|). The penalty is a natural oversamphng 

factor since, on average, only If^^l out of every random Fourier samples will carry any 
information about the signal. 

2.5 Robustness in the Presence of Measurement Noise 

We now extend our analysis to account for additive complex-valued noise in our observa- 
tions. For random frequencies {w^} taken uniformly from 17, we assume that the noisy 
measurement vector, y„, is formed as 

yn[k] = Ae-''^''^'>so{ujk) + nk, k = l,2,...,m, 

where the additive noise terms {n^} are independent zero-mean complex-valued Gaussian 
random variable^ with variance cr^, and the noise vector is n := [ni,n2, ...,nm]"^- Comput- 
ing the inner product of t/„ with the test vector ■0,- for all r G T leads us to the process 

Xn{T):={yn,i^r)=X{T)+N{T), (14) 



where X{t) is as defined in Section 2.2, and N{t) is the noise process that quantifies the 



effect of additive noise in our analysis: 

m m 

N{t) := {n,^r) =Y.nk%{u:ky^-^ = ^ n^F K)ei'^^-^ 

k=l k=l 

The noise process is zero-mean, i.e., EA^(r) = 0. 

Thus, in the case of noisy observations, we can estimate the ideal autocorrelation function 
Rss{') (up to the unknown amplitude A and translation tq) simply by rescaling the noisy 
random process Xn{T). Let us define 

~, . { 4^ • RefA^fr)), real case, , , 

N{r) := \ 2™ ^ ^ (15) 

I 2^"^('^)' complex case, 

and note that in either case, E[A^(r)] = 0. Then, if we set 



-Rss,n(''") •" 

it follows in either case that 



• Re(X„(r)), real case, 

• Xnir), complex case, 



Rss,n{^)=Rss{T) + N{T), (16) 

^That is, the real and imaginary parts of each are independent, real-valued zero-mean Gaussian 
random variables with variance -^r . 
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where Rss{t) is as defined in ([7|). This function provides an unbiased estimate of the shifted, 
scaled autocorrelation function of s{t), since E[i2ss, «(''")] = ^RssiT — tq)- 

We can gain some intuition for how the noise is hindering the estimation process with a 
quick estimate on its expected size at a fixed point r. In both the real and complex cases, 
the variance of N{t) is bounded by 



Var 



Nir) 



< 



E 



k=l 
m m 
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fcl = l fc2 = l 
2 ™ 



EE|nfc|2E|s(wfc)|^ 



k=l 



2 ||^||2 
~^n\\^\\2^ 



Aii'^m 

and so, using Jensen's inequality, we obtain 

E|iV(T)| < ^■'Tn 



M\\s\ 



m 



(17) 



Recall that the peak of the noiseless estimate |i?ss(T)| is on the order of \A\ i?ss(0) = 
\A\ (27r)~"^||s||2- Thus the noise process will overwhelm the peak of the noiseless estimate 
when 

m 



an 



\A\ \\s\ 




(18) 



Theorems [4] and [5] below show that for m large enough, we will have essentially the same 
bound as ( [Xt] ) hold uniformly over the entire search interval T. As a result, the amount 
of noise (size of cj^) the compressive matched filter can withstand is essentially (to within 
constant and log factors) the same as in (18). 

We start with a bound on the expected maximum of the noise process. The following result 



is proved in Section 4.2 



Theorem 4. Suppose that \T\ > 3. Then the noise process N{t), as defined in (15), 
obeys 



Esup 



N{t] 



< 0.36 • an 



m 



0.199Vlog(|Sl||r|) + 0.166 



\^\\\s\ 



m 



• Vlog(l^^llTl). 



The next theorem shows that, given m large enough, the maximum of the noise process will 
not be too much larger than its mean with high probability. The following result is also 



proved in Section 4.2 



Theorem 5. Fix 6 > 0. Suppose that \T\ > 3 and that 

m > Cs • max (^f , [12) • log(l/(5), (19) 
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where C3 is a known universal constant. Then the noise process N{t), as defined in (15), 
obeys 



P < sup 



N{t) 



> €4- an 



^\ \\s\\2 



m 



■ max (^/logi\n\\T\), Vlog(2/5)) } < 6, (20) 



where C4 is a known universal constant. 

These two theorems, taken in conjunction with Theorems [T] and |2| give us a bound on how 
far the estimate Rgs^niT) created from noisy samples will vary from its mean. With high 
probabihty, we will have 



RssAt) - ^Rss{t - To] 



< 



max 



m m 



m 



Just as in the noiseless case, the bound on this deviation of our estimate of the autocorre- 
lation function can be translated directly into a performance guarantee for the compressive 
matched filter. This is codified in the following corollary. 

Corollary 6. Suppose there exist constants ai G [0,1) and 02 > such that \Rss{t)\ < 
aiRss{0) for all |r| > 02- Suppose also that \Q\ \T\ > 3, that is satisfied, and that 



\og{12\n\\T\/5) 2 v/log(4/,5)log(12|^7||r|/,5) 

m > C5 • max n^, 1^2, 

\ (1 — ai)'' I — ai 

max(log(|17||r|),log(2/<5)) al\n\ 



(l-ai)2 



(21) 



where C5 is a known universal constant. Then with probability at least 1 — 25, the maximum 
value of \Rss,niT)\ must be attained for some tq within the interval [tq — a2,To + 0^2] • 



Proof. Using (16), we have that 



sup \Rss,n{'T) - ARss{t - tq)\ < sup \Rssir) - ARss{t - ro)| + sup |iV(r)|, 



where Rss{t) is defined in (j7|). With probability at least 1 — 26, both (11) and (20) will be 
satisfied, and so we will have 



\Rss,niTo)\ > \A\ 



27r 



U 



C^an^/WlWsh max ^log{\n\\T\), ^\og{2/6) 



m 



(22) 



and 



\Rss,nM\ < ail^l V^ + f^- 



dan^/WlWsh max Vlog(|J7| |r|), ^log{2/6) 



m 



(23) 



for all r such that |r — tq] > Q2. If (21) is satisfied with C5 = max(16Cf , 4Ci, 647r C, 



4^) 



then it follows that the right hand side of (22) must exceed the right hand side of (23). As 



with Corollary [3j one can slightly strengthen this result by choosing differing constants for 
the three terms appearing in (21 ). □ 
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(a) (b) (c) 

Figure 3: Estimated scaled, shifted autocorrelation function Rss,n{T) (solid blue line) obtained from 
noisy samples, and for the sake of comparison, the estimate Rgs (t) (dashed red line ) that would have 
been obtained without noise. For all experiments, the number of samples m = 50, and the noise 
level (a) a„ = 0.2 • \A\\\sh./^n/W\, (b) a„ = 0.5 • \A\\\sh./^r^\, and (c) (7„ = \A\\\sh^/^n/W\. 
Overall, the time-of-arrival estimation is reliable in the first case, tenuous in the second case, and 
completely unreliable in the third case. 



To within a constant factor, the first two terms in (21) are the same as in Corollary [3j we 
might think of these terms as "activation" conditions for when the compressive matched 
filter will be well-behaved in the absence of noise. After these conditions are met, it can 
withstand noise levels up to a size 



We can interpret (18) as the noise level at which the operation of the compressive matched 



filter will fall apart completely, and (24) (which is only a log factor smaller) as the noise 
level at which we have guaranteed accuracy. 

Three examples of the estimated autocorrelation function Rss,n{T) are shown in Figure [s] 
for the Gaussian pulse example from Section |2.3| with a fixed number m = 50 of samples 
and various values of cr^ (the noiseless estimates Rss(,t) are overlaid). For the same number 
of measurements. Figure |4] shows the average performance of the compressive matched 
filter versus the noise level. For various noise levels cj„ between and |^| ||s||2 y^^^VWI) 
we run 1000 experiments generating random sample frequencies and random noise, and 
estimate tq by identifying the peak of |i?ss^„(r)|. The figure indicates the percentage of 
trials in which the delay was estimated to within a distance 2a of the correct value tq. 
We see in these experiments that the estimator begins to lose effectiveness roughly when 
an ~ 0.25 • |A|||s||2yWM- 

It is worth recalling that a user may have some control over selecting the observation interval 
Q. In most cases it would be natural to choose Q roughly equal to the essential bandwidth 
of So- Taking 0, larger than this will increase //i, /i2, and the sensitivity to noise; taking Q 
smaller than this will generally increase the width of the main lobe of the autocorrelation 
function Rss{t) and thus limit the resolution to which tq can be estimated. 



Finally, we can compare the noise levels in (18) and (24) to the noise levels at which a 



digital matched filter working from a set of samples taken in the time domain at the Nyquist 
rate |Q|/27r will stop being effective. Suppose we sample the (bandlimited) return signal 



17 



% success with cr^, = c[^|||s'[[2\/m/|n| 
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c 

Figure 4: Percentage of correct time-of-arrival estimation (\tq — tqI < 2a) over 1000 trials, as a 
function of tie noise level cr„ = c • |A|||s||2v/™7Pl, < c < 1. 



As{t — tq) at the Nyquist rate, and noise is added to these samples. We observe 

yd[l] = Asd,ro[^] + M^], where Sd,ro[^] = s{t - ro)|i=^27r/|o| ' 

and nd[i] is sequence of independent zero-mean Gaussian random variables with variances 
cr^|0|/27r. This variance is chosen to make the noise process similar to that analyzed in the 
compressive case; it corresponds to samples of a continuous-time process that has a power 
spectral density equal to o"^ on Q and zero elsewhere. 

Focusing just on the complex case for the sake of brevity, once we have collected yd, we can 
estimate the scaled, shifted autocorrelation function using 

Rdir) = {yd, Sd,r) = Msd,To,Sd,T) + {rid, Sd,r) (25) 

and choose as our estimate of tq the maximizer of |i?d(T)| over all r. At the correct shift 



To, the first inner product in (25) is given by 



{sd,To,Sd,To) = ^\sd,To[i]\'^ = ^l|s(t-ro)||2 = jSll^lli, 

e 

where the second equality comes from the fact that s(-) is bandlimited and we are sampling 
at the Nyquist rate. The second inner product is a Gaussian random variable with 

VarKn,,.,,)] = ^^\sd,r[lf = 

and so 

E\{nd,Sd,T)\ < • ||s||2- 



Roughly speaking, then, the Nyquist sampled matched filter will be overwhelmed by the 
noise when 

CTn ~ |^|||S||2. (26) 



Comparing (26) to the compressive matched filter results (18) and (24), we can interpret 



the factor of y^m/|0| as a sort of under sampling penalty; as the number of samples gets 
smaller, the noise tolerance gets worse. When m > the performance of the two schemes 
will be similar. (A similar undersampling penalty arises in standard CS [8], where the 
noise variance that can be tolerated for a given recovery error decreases as the number of 
measurements gets smaller.) 
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3 Pure Tone Estimation 



As discussed in Section 1.2.3 the roles of time and frequency are completely interchangeable 
in our settings. This allows us to apply the compressive matched filter to the problem of 
finding the carrier frequency of a modulated signal from time-domain samples with minimal 
effort. One particular such application is studied here: estimating the frequency of a pure 
tone from random samples in time. 

Formally the problem under study is described as follows. A pure exponential ^e^'^"* with 
fixed — but unknown — frequency G f^, amplitude |^|, and phase is observed on T = 
[^^max5 ^max]- Let y be the vector of observations at sampling times ti,t2, ■■■,tm & T, which 
are randomly chosen from a uniform distribution on T, i.e., 

itJoti 



y 



A 



Given y £ C"^, we are interested in estimating ojq €z Q and A G C. A natural approach to 
solving this problem is to find the find ujq and A which best explain the measurements in a 
least-squares sense. More formally, we define 

m 

(2o,^) := argminy^ - ^ • e"^*H = arg min ||y - ^d'^i.^lln, (27) 



k=l 



where for any to £ Q, the test vector ip;^ G is given by: 

V;^[A;] =e^^*^ A: = 1, 2, . . . , m. 
The least-squares solution for loq is given by 

Co = argmax|(y,?/^^)| , 

and subsequent to estimating uiq, the least-squares estimate for A can be computed as 

A = {y,ipQa)\\i^Cjo\\2'^- 



(28) 



3.1 Analytical Framework 



Equation (28) suggests a correlation-based strategy for estimating the unknown frequency 
ojQ. In order to study the performance of such an estimator, let us define the random process 
X{u}) := {y,ipuj) on 17, which has the mean function 



EX{u}) = AE^e' 

k=l 

m 



m 



k=l 

m 

{uJo-Uj)tm. 



k=l 

mA\T\'^ [ e'^^"'"^)* dt 
Jt 

mA\T\-^ ■ |r|sinc Q |r| (cjq - uj)^ 
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where sine (a) := sin (a) ja. One way to interpret X{uS) is that we have approximated the 
continuous-time inner product between two time-hmited complex sinusoids as a discrete sum 
with samples taken at random locations; the above tells us that this estimate is unbiased. 

Further attention reveals that we are facing the same problem as in the complex case of Sec- 
tion |2| where the roles of time and frequency have been interchanged: the frequency domain 
becomes the "shift domain," while the time domain becomes the "observation domain." More 
precisely, we may define So('^) = 27r(5(ci;) which has the inverse Fourier transform So(i) = 1- 
Our received signal can be expressed in the frequency domain as ^ • ^o('^ ~ "^o) foi' some 
Wo G O. However, we will observe m samples of this signal in the time domain, acquiring 
values of Ae''^o*So(t) = ^6^"^°* at times ti,t2, ■ ■ ■ ,tm £ T. 

Now, in the observation (time) domain, we define s{t) to be the time-limited version of So(t), 
i.e., s{t) := Itt^T where I denotes the indicator function. Returning to the shift (frequency) 
domain, we have 's{u!) = |T|sinc (^ \T\ cj). Up to a constant factor, this expression equals 
its own autocorrelation function, i.e., R-^{uj) = 27rs(a;). 

Therefore, we can estimate the ideal autocorrelation function -Rgj(-) (up to the unknown 
complex amplitude A and translation ojq) by rescaling the random process X{uj): 

Rssiuj) ■■= -^X{u;). (29) 
m 

This estimate is unbiased since E[i?5j(a;)] = A- Rss{uj — u}q). It is clear that solving the least- 



squares problem (28) is equivalent to searching for the maximizer of \Rsg{uj)\. Since the the 
main lobe of i?Ss(a;) = 27r|T|sinc (^ |r| w) is centered at the origin (with Rgg{0) = 27r|r|), 
we informally expect that, on average, finding the maximum of |ii5s(a;)| correctly estimates 

3.2 Noiseless Analysis 

To study the concentration of R-g^(u}) about its mean, we may follow the same arguments 



as in Sections 2.2 and |2.4| while simply exchanging the roles of time and frequency. In 
particular, we note that the problem of pure tone estimation corresponds to the first "special 
case" studied in Section |2.4[ because the windowed signal template has uniform magnitude 
in the observation domain. Thus, we have /xi = = 1. This leads us to the following result 
for the case of noiseless observations. 



Corollary 7. Fix 6 > and let 



U = 27rCi|A||r| • max 



1 . ^ i„,(i2|ori/i) . 

Wm m I 



// \T\ > 3, then the estimate of the autocorrelation function in (29) obeys 

Pr <^ sup Rss{^) - AR^{u; - ojq) > U> <5. 

This corollary follows immediately from Theorem [2} 
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A close inspection of the definition of i?sj(a;) reveals that wnis guaranteed to be a maximizer 
of with \Rss{u}o)\ = 27r|A||r|. What Corollary M ensures is that even for small 

values of m, no other values of lo far from uiq can give |i?5s(a;)| equal (or even close to) 
\R-^{ujq)\. This fact is not only useful when we introduce nonidealities into the observation 



process (see Section 3.4) but also in guiding a c ompu tational method to search for the peak 
of 1^^ 



We investigate this issue in Section 



3.3 



below. 



3.3 A Grid Search Approach 



In practice, in order to find the peak of |i?5j(ti;)|, one might hope to simply sample this 
function over a uniformly spaced grid of frequencies drawn from 0,. Because Rgs(u}) is 
guaranteed to remain close to AR-g^{u} — ujq), which decays sharply away from ujq, it is 
possible to guarantee that as long as the grid is chosen sufficiently fine, then the empirical 
maximum over the grid points will occur very close to the true peak. 

To illustrate this fact with some specific but arbitrary values, let us note that for |a;| < 
7r|T|-i, \Rss{<^)\ > 0.636 • 27r|T|. Moreover, for \uj\ > 2it\T\-'^, \Rss{^)\ < 0.218 • 2tt\T\. 
Following the techniques used to prove CoroUaryjsj we can ensure that 0.636-27r \ A\ \T\ — U > 
0.218 • 27r \A\ \T\ + U with probability at least 1 — 6 hy taking 



m > C2 • max 



log(12|17||r|/^) ^log(4/^)log(12|l]||r|/J) \ 



(0.636- 0.218)2 



(0.636- 0.218) / 



It follows that, if we initially search for the maximum of |i?5s(u;)| on a grid with resolution 
27r|T|~"'^ (note that this is the so-called grid of Nyquist frequencies, given T), we are guaran- 
teed that the empirical maximum will occur at a grid point Qq such that |S5o— wqI < 27r|r|~^. 

After this initial grid search, it is actually straightforward to refine the accuracy of the 
estimate Qq using a local concave ascent. Note that 

2 



2ir\T\ 
m 

27r|r||A| 
m 



(2vr|T||A|)- 
m 



2t,\T\\A\ ^ ^ 



m 



cos((wo -^) {ti - tj)) ■ 



Since 
wo| < 
1^0 - 



n-i 



2 

Wo I 



< |r|, |i?5j(a;)p is guaranteed to be a concave function of uj when 
Therefore, if we have an estimate lDq sufficiently close to the true loq (i.e., 
< I • |T|~^), a standard concave maximization (akin to convex minimization) 
procedure beginning at |i?55-(a)o)| will give us the exact value for ojq. Since the grid search 
above guarantees that |a}o— f^o| < 27r|T|~^, one could ensure success by running four concave 
maximizations starting from the points cDq ± ^ • |r|~^ and cDq ± ^ • |r|~-^. 



3.4 Robustness 



It is also possible to consider nonidealities in the observation process. Following the same 



set of arguments as in Section 2.5 (but exchanging the roles of time and frequency), we 



21 



arrive at the following result. 

Corollary 8. Let N{uj) denote the random process induced by additive complex-valued Gaus- 
sian measurement noise having variance a"^, and define 

27r|r| 

RssA^) ■■= -^{X{oj)+N{lo)) 
m 

to be the estimate of the autocorrelation function formed using the noisy samples. Let 5 > 0. 
Suppose that \Q\ \T\ > 3, that m > C3log{l/6), and that 



m > Cs-max 



'log(12|0||r|/<5) ^logiA/6)log{12\n\\T\/6) max (log(|J^||r|), log(2/(^)) 



(1-0.218)2 ' 1-0.218 ' (1-0.218)2 |Ap 



Then with probability at least 1 — 25, the maximum value of |i?5s „(tj)| must be attained for 
some Qq within the interval [ujq — 2tt\T\~^ ,ujo + 27r|T|~-^]. 



Finally, let us note that with some additional work, we believe it would be possible to extend 
our analysis to account for multiple tones (or multiple translated pulses in the context of 
Section |2]) . The problem becomes that of detecting the true peaks in a noisy sum of sine 
functions. For tones that are well-separated, one could argue that the interference in the 
random process is minimal and that any prominent peak in „(u;)| indicates the presence 
of a tone. Tones that are very close may be impossible to discriminate (this is true even 
with Nyquist-rate samples), while tones that are moderately separated may be possible to 
discriminate by employing a greedy, iterative estimation procedure. 



3.5 Stylized Application: Chirp Time-of-Arrival Estimation 

We close by noting that the ability to estimate a pure tone's frequency from random time 
samples can also be parlayed into a technique for estimating a chirp signal's time-of- arrival 
from random time samples. For this discussion, suppose we receive a chirp signal 

x{t) = ^exp (i (ucit - to) + |(i - tof)) 

over some time interval, where Uc denotes the known starting frequency, a denotes the known 
chirp rate, A denotes the complex amplitude, and to denotes the unknown time-of-arrival. 
We can "de-chirp" this signal over this interval, computing 



x(t) = x{t) exp 




where ^ is a complex amplitude. The signal x{t) is merely a complex sinusoid (in this 
case, with frequency ato)- We have argued in this section that it is possible to estimate a 
pure tone's frequency from random samples in time, and in this case that means that it is 
possible to estimate the time-of-arrival parameter to from random samples of x{t) in time. 
It is important to note that time samples of x{t) can be computed easily from time samples 
of x{t) itself, since the two signals are related via point-wise multiplication. 
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4 Theory 



4.1 Proofs of Theorems [T] and [2] (Noiseless Analysis) 

Let us begin by noting that, in the real case, both Theorems [T] and |2] are concerned with 
bounding 

Rss{t) - ARss{t - To) = Rss{t) - A ■ Re{Rss(,T - To)) 

^' Re{X{T)) - ^ ■ Re(EX(r)) 



< 



2'Km 

M 

lirm 
27rm 



27rm 

Re(X(r) -EX(r))| 
XM -EX(r)|. 



In the complex case, both theorems are concerned with bounding 

15^1 



Rss{t) - ARss{t - To) 



2TTm 



\X{t)-¥.X{t) 



Thus, to cover both cases, it suffices to focus on bounding |X(r) — EX(t) 



(30a) 



(30b) 



4.1.1 Setup 

The first step in our approach to bounding \X{t) — EX(r)| is to define the centered process 

m 

Y{t) :=X(r) -EX(t) = A^\s {ujk)\^ e'^^^^-^°^ - 2TTAm\9.\-^ R,s{r - to). 

k=l 

Our goal is to bound sup^|y(r)|, but to do this, we relate the random process to one 
that is more easily bounded. First, we symmetrize Y{t) in the standard way. Create an 
independent copy Y'{t) (generated from an independent set of samples uj'i,uj'2 - ■ ■ ,w^), and 
define 



Z{t) := Y{t) - Y'{t) 

m 



(31) 



fc=i 



Each term in (31) is a symmetric random variable, and so Z(t) has the same distribution 



as 



fc=i 



Z'iT) := A^efc (|s(a;fc)|2e''^^(^-^o) - \s{u;',)\\'<'-^-^^^ 



where €1,62, ■■■,€m is a Rademacher sequence independent of everything]^ 

We can control Esup^ 1^(^)1 through Esup^ |Z'(r)| using the following simple result, which 
is proved in Appendix [A] 



Rademacher sequence is a sequence of independent random variables taking ±1 values with equal 
probabilities. 
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Lemma 9. Esup^|y(r)| < E sup^ |Z'(r)| . 



Furthermore, the deviation of sup^ |^(''")| from its average can be controlled through the 
corresponding deviation of sup^ |Z'(r)|. The following is proved in Appendix [b] 

Lemma 10. For any A > 0, 

p|sup|y(T)| > 2Esup|y(r)| +a1 < 2 P (sup |Z'(t)| > a1 . 



The above results allow us to focus on developing expectation and tail bounds for sup^ |Z'(t)|. 
We establish such bounds in the following subsections. 

To ease the notation below, we make the following definitions for quantities that will appear 
often: 

M = M{s) :=sup|sH|2 = 
wen 



and 



Ml = Mi(s,m,0) := 
M2 = M2(s,m,0) := 

Afs = M3(s,a;i,a;2, • • . , w^, w'l, ^2, • • • 

M4 = M4(s,wi,a;2, . . . ,0;^) := 




9.\ 




14) 



\S\\2, 



Y^\^u,)\^ + \s{u:',) 

\ k=l 



\ k=l 



We will also frequently use the following convenient facts. For any a and b, we have 



2isin 



a — b 



Aa+b)/2 



and 



\a + bf < 2|a|V2|6|^ 



(32) 



(33) 



Also, for any c,u > 0, the following inequality follows from a standard Gaussian tail 
bound [19]: 

2 

(34) 



x>u 



e ^ ax < — e ^ . 
~ 2u 



4.1.2 Chaining 

We start by bounding sup^|Z'(r)| conditioned on the choice of {uk} and {w^}. To this 
end, we will use a chaining argument similar to what is used to prove the general Dudley 
inequality |12] , but optimized for our particular process (this will allow us to tightly control 
the constants). 

The following tail bounds for Z'{t) and its increments are proved in Appendix O 
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Lemma 11. For a fixed r G M and any A > 0, Z'[t) obeys 

P.,{|Z'(r)|>A} < 2exp(-^^), (35) 

where P^j. denotes probability with respect to {efc} conditioned on fixed {cok} o,nd {w^,}. Also, 
for fixed ti,T2 £ M, 



P..{|Z'(n)-Z'(r.)l>A} < ^■^-^{-JJiniiWW^)- 



(36) 



We will consider the values of Z'{t) on a series of discrete grids of points that are essentially 
localized on the interval T = [rmin, Tmax]- For each integer j > 0, let Tj be a grid of points 
spaced 2~-'|ri|~-^ apart: 

Tj = {Tnun + 2-^-^\n\-^ + k2-^\n\-^, k = 0,l,...,[2^\n\\T\\}. (37) 

All points in Tj belong to T, except possibly the final point in Tj, which may exceed Tmax 
by no more than 2~''^^|il|^"'^. Moreover, if we denote by '7rj(r) the closest point in Tj to a 
given point r, then |r — 7rj(r)| < 2~-'~^|r2|~-'^ for all t £ T. The points in the Tj are arranged 
like nodes in a dyadic tree, with each "parent" in Tj having two "children" in Tj+i (the two 
points that are closer to the parent than to any other point in Tj); the only exception to 
this rule occurs if is odd, in which case the final point in Tj has only one child in 

Tj+i. 

We define Lj to be the set of "links" that connect the parents in Tj to their children in Tj+i: 

Lj = {{p, q) G (Tj, Tj+i) I iTj{T) = p and 7rj-|_i(T) = q for some r G T}. (38) 

Because of the one-dimensional structure of T and the particular arrangement of Tj's, we 
observe that every child in Tj+i is associated with only one link, and thus if^Lj = ^Tj+i < 
2-^+^irJl |T| + 1. Furthermore, the length of every link is half of the distance between con- 
secutive points on Tj+i; that is \qj — Pj\ = 2~^~'^\^}\^^ for all {pj,qj) G Lj. 



For almost every t £ T 29, (6.46)], we can decompose Z'{t) as a sum of the differences 



between approximations at different scales, writing the telescoping sum 
Z'{t) = Z'iMr)) + Vz'(vr,+i(r)) - Z'{7rj{T)). 



Thus 



\Z'{t)\ < |Z'(vro(T))|+j;|Z'(^,+i(r))-Z'(vr,(r))|, 



and 



i>o 

sup|Z'(t)| < max |Z'(po)| + V ^ max \Z' (qj) - Z' {pj)\. 
Therefore, for any Ai, A2 > and any sequence of positive numbers {uj} such that Ylj>o ""i — 
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1, we have 



Pe, |sup|Z'(r)| > Ai + Aaj 

f 

I max |Z'(r)| > Ai 1 U J V max \Z\qj) - Z'{pj)\ > A2 

< — 

P.. { max |Z'(r)| > Aij + P, J ^ max \Z'iq,) - Z'{p,)\ > X 
P,.{m„|Z'M|>A,} + 2P,.{ 



max \Z'{qj) — Z'{pj)\ > X2Uj > . (39) 



To bound the first term in (39), we apply (35) along with the union bound and the fact 



that #To < \n\\T\ + 1 to obtain 

P,, (max \Z'{t)\ > All < 2(#ro)exp 



Poero 



To bound the second term in (39), take uj = \/j + 3 2 ^ and assume that A2 > Ao/3, 
where Aq := 3|^|M3 y^log (2|0| |r|). Then, for every j > 0, we have 



X2 



<2(#L,)(2|0||r|)-^-exp^^^ 



< minr'-' + 2-^-\MT\r-') exp (^^^ 



where the first line above follows from applying (36) along with the union bound and the 



fact that \qj — pj\ = 2 ^ ^ for all (pj, qj) G Lj, the third line uses the assumption that 

A2 > Ao/3, and the fourth line follows because #Lj < 2^+^\n\\T\ + 1. If we assume that 
\n\\T\ > 3, it follows that 

<(lV30)exp(^). 



Putting together our bounds for the first and second terms in (39), for any A > Aq we may 
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take Ai = 2 A/3 and A2 = A/3 to conclude that 



P..{=up|Z'M| >a} < PI"l|T| + 2)e>:p(^pj|^) +(17/30)exp(^' 



\2 

< {2\n\\T\ +2.57)exp ' 



9\A\^Ml 



A^ 

9|^|2M| 



<3|f^imexp(— -2). (40) 



The third hne above follows from our assumption that |ri||T| > 3. In the subsections 
that follow, we translate this conditional tail bound into unconditional expectation and tail 
bounds for sup^ |Z'(t)|. 



4.1.3 Completing the Proof of Theorem [T] (Expectation) 

Conditioned on the choice of {oJk} and {w^}, we can integrate the tail bound developed 
above to obtain an upper bound for E^j. sup^ |Z'(t)|. Note that, for any nonnegative random 
variable V, we have 29 Prop. 6.1] 

POO 

/ F{V>u}du. (41) 
Jo 



Once we have bounded the average of the supremum of the conditioned process, it is then 
straightforward to extend this to a bound for Esup^ |Z'(r)| by removing the conditioning 
on {wfc} and {w^}. 



Recall that Aq = 3|74|M3 y^log (2|rj| |T|). Then it follows from the identity above that 

E,,sup|Z'(r)| = [ P,, |sup|Z'(r)| > aI^A 
tst Jo UeT J 

= / " P,, (sup \Z'{r)\ > a) dA + / P,, (sup \Z'{ 

Jo IreT J JXo IreT 

<A„ + (2|«||r|+2^57)/ e.^[^^yx 



(t)\ > \Ux 



9MpM| / -A^ 



<A„ + (2|q|T|+2.57).™exp(^^ 
.3,.|M3«W).(l.if^)^ 

< \A\M3 (3^log{2\n\\T\) + I.61) . (42) 



The fourth line above follows from ( 34 ) , and the sixth line follows from our assumption that 



|0||r| > 3. Now it remains to remove the conditioning by taking the average over {cok} and 
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{o;^}. First note that, by Jensen's inequality, we have 



k=l 



\ fc=i 

V2mE^|?(a;)|4 

2m\n\-^ \\s\\l 
V2M1. 



Now combining the above inequahty with ( 42 ) brings us to 



Esup|Z'(r)| =E,^,^, E,, sup|Z'(r 



< E^^y^ \A\Ms (3v/log(2|J]||r|) + 1.61 

< \A\Mi (^4.25Vlog(2|l^||r|) + 2.28^ . 

The final link to the random process of interest is via Lemma [9j 
Esup|y(T)| < Esup|Z'(t)| 



< \A\Mi 4.25Vlog(2|f7||r|) + 2.28 



< \A\Mi ( 4.25Vlog(2|Jl||r|) + 



2.28 
Vlog6 



Vlog(2|f]||T| 



< 5.96|^|Mi0og(2|Jl||r|), 



(43) 



(44) 



(45) 



where the third line follows because we assumed that |n||T| > 3. This completes the proof 
of Theorem [T| after plugging (44) and (45) into (30a) or (30b). 



4.1.4 Completing the Proof of Theorem [2] (Tail Bound) 



Recall the tail bound obtained in (40). In this section, we remove the conditioning on {u}k} 



and {uj'i^} to obtain a tail bound for the supremum of |5^(t)| on T. From (43), recall that 



M| is a sum of independent bounded random variables, and thus it is closely concentrated 
about its average EM| = 2Mf To quantify this, we will use the classic Bernstein inequality, 
which is restated below for convenience. 



Lemma 12. |1| Consider a sequence of independent zero-mean random variables Vi, V2, . 
with \Vk\ < B for k = 1,2, ...,m. Then for any A > 0, the following holds: 



-A2 



2p2 + 2BA/3 



where = ^^^^EV^^. 
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Here, take Vk = \'s{ujk)\'^ + — 2\Q\ ^ ||s||^ and note that 

EVfc = 2E (is(wfc)l^ - \n\-^ \\s\\t) = 0. 



Also, 



iVfcl < sup |?(a;)|^ + \s{u;')\^ - 2\n\-^ \\s 

< sup max {\s{uj)\'^ + \s{oj')\^, 2\Q.\~^ ||s 
(2sup|?(w)|^2|^7| 



max 



1-1 



2 max 
2M^. 



The second Hne above follows from the convenient fact that |a — 6| < max (a, 6) for any 
a, 6 > 0, and the last line follows from the fact that Hs'H^ < sup^ |s'(a;)|^. We also have 

m 

p2 = + \s{u:',)tf - m\-^ \\s\\l 

k=l 
m 

< 



5]E(i?K)r+i?K)|4)^ 



fc=i 

m 



< ^2E\s{u}k)f + 2E\s{u}'k)\ 

k=l 

= 4mE\s{aj)f 

< 4mM'^E\s{uj)\^ 
= 4mM^\n\-^\\s\\l 



The first line above follows because E[V — EV)"^ = EV'^ — (EV)"^ for any real- valued random 
variable V. The third line is implied 
to Yyk=i for any A > and obtain 



variable V. The third line is implied by (32). Now we can apply the Bernstein inequality 



-A2 



(46) 



P |M| > 2M? + A) < exp — ^ — - — — 

^ ^ 1^ / - ^ \^8M2m2 + (4/3)Af2A 

Assume that Mi > M^/log(I/5). Then ^ implies that 

P{M|>2Mf + A} < expf 'f'^'^^f] ). 
Take A = 3.58Mf . Then 

P {M| > 5.58Mf } < 6/A. (47) 

Now assume that Ml < My^log(4/(5). Then 

P {M| > 2M^ log(4/5) + A} < P {M| > 2Mf + A} 

^ - -A^ 
< exp 



8M4 log(4/(5) + (4/3)M2A 
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Take A = S.SSM^ log(4/(^). Then 



P {M| > 5.58M2log(4/5)} < <5/4. 



(48) 



Therefore, combining (47) and (48), we arrive at 



P > 2.37 max (^Mi, AfVlog(4/5)) } < (5/4. 



Let £: denote the event that M3 < 2.37 max (^Mi, My^logW^) j . Then clearly, 

P{^^} > 1 -(5/4. 

On the other hand, taking A = 3|y4|M3Y^log(12|J^||T|/(5), (|40| implies that 



(49) 



P,, sup|Z'(r)| > 3|A|M3Vlog(12|f]||r|/(5) < (5/4 



(50) 



Now we can combine (49) and (50) as follows. For notational convenience, set b :- 



3\A\M3 Vlog(12|f]||r|/(5), define 

u := 7.11\A\ max (^Mi, M Vlog(4/(5)) y^log{12\Q\\T\/6), 

and note that 



p|sup|Z'(t)| > 





sup 


Z'{t) 


> u 






IreT 










sup 


Z'{r) 


> b 















4 

(r)| >6 



{cjfe} , {uj'k} \ dfi {{ujk} , {wfc}) + 



f jp{£:} + 
-^^^>-p^//^4^??l^' 

^ f ^ , r ^ 

< / - d/i(a;i,a;2, • • • j'^m) + ^ 

_ 5 
~ 2' 

The fourth line above follows from the definition of conditional measure. The final link to 



the random process of interest is via Lemma 10 



p|sup|y(r)| > 8.5|A|MiVlog(2|0||r|) + 4.56|A|Mi + n| <p|sup|y(T)| > 2Esup |y(r)| + 

< 2p|sup|Z'(r)| > 

(51) 



'1 
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where we used (44) in the first hne above. This completes the proof of Theorem |2j as 
8.5|^|MiA/log(2|J]||r|) + 4.56\A\Mi + u 

< \ A\ max (^Mi, M^yiog{A/6)^ (8.5^/log{2\n\\T\) + 4.56 + 7.11 Vlog(12|0| |r|/5)) 

< \ A\ max (Mi, M^yiog{A/6)^ (l5.61 Vlog(12|17||r|/5) + 4.56^ 

< |A|max(Mi,MVlog(4/5)) f 15.61 Vlog(12|r?||T|/J) + - ^:^^ ^log{l2\n\\T\/5)] 

V /V V log 36 / 

< Ci|^|max(Mi,MVlog(4/5)) 7bi(12pMM), 



where the fom'th hne follows from the assumption that |r2||T| > 3, and the fifth line holds 
by taking Ci = 18.02. Actually, we see from the above that we may slightly improve upon 
the value of U specified in (10) by taking 

C/ = maxf^, ^ • Jlog{A/6)] ■ ( 15.61 Vlog(12|l^| |r|/5) + 4.56 
\^/m m J \ 



4.2 Proofs of Theorems |4] and [5] (Noisy Measurements) 

Let us begin by noting that, in the real case, both Theorems |4] and [S] are concerned with 
bounding |A^(t)| = |Re(A^(r))| < the complex case, both theorems are 

concerned with bounding |A^(t)| = |A^(r)|. Thus, to cover both cases, it suffices to 
focus on bounding |A^(r)|. 

We ffist bound Esup^ |A^(r)|, and then we show that sup^ is sharply concentrated 

about its mean with high probability. 



4.2.1 Proof of Theorem [4] (Expectation) 



We begin by noting that conditioned on {w^}, N{t) is a complex-valued Gaussian process to 
which we can apply a chaining argument similar to the one put forth in Section 4.1 With 



the {cok} fixed, at each r the real and imaginary parts of N(t) have the same Gaussian 
distributions, and thus the magnitude |A^(t)| is a Rayleigh random variable with second 
moment 



E„j7V(r)|2 = E„ 



k=l 
k=l 

m 



k=l 



(52) 



where the second line follows from the independence of {n^}. It is known that a Rayleigh 
random variable V with El/^ = satisfies V {V > X} = exp(— A^/c^) (Toj, and thus a tail 
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bound for |A^(t)| follows directly from the above: 



P„,{|iV(r)|>A} = exp 



alMl 



(53) 



Likewise, the increment |A^(ri) — A^(t2)| is Rayleigh with 



E„,|iV(ri)-iV(r2)P = E„, 



k=l 



k=l 



k=l 
m 



= ^4|s(L^fc)|2sin2 (L^fc(ri - T2)/2) 

k=l 

m 

< ^ny^ l^(^fc)|^|^A;P|Tl - r2|^ 
fc=l 

^ m 



fc=l 



1 



The second line above follows from the independence of {n^}, and the fourth line follows 

(54) 



from (32). The tail bound for the increment is then 
Pn,(|A^(ri)-A^(r2)| > A) < exp 



-4A2 



a2M||J7|2|Ti-T2|2 



With the same definition of the sets Tj and Lj from (37) and (38) in Section 



write the telescoping sum for N{t) and proceed similarly to obtain 

sup|A^(r)| < max |iV(po)| + V max \N{qj) - N{pj 

^ rrt nil C i^r\ ' ■ I m . j~i .\ T 



4.1.2 



we can 



and so it follows immediately that 



En, sup \N{t)\ < En, max \N{po)\ + > max \N{qj) — N{pj)\. (55) 



rer 



i>o 



{Pj,qj)eLj 



We now use the following standard result that bounds the expected maximum of a finite 
set of subgaussian random variables. The proof is included in Appendix [Pj 

Proposition 13. Let Vi, V2, ■ ■ ■ , Vn be random variables with P {\Vi\ > A} < Ke^^^^'^'^^ . 
Then 

E max \Vi\ < a I Jl XogiKN) + , ^ \ . 

i<i<^' ' - I V sv ; ^21og(K7V) ] 
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Applying Proposition 13 along with (53) allows us to bound the first term in (55): 



En, max |iV(po)| < 



< 



< 



OnMi 

V2 



V2iog(|fi||r| + i) + 

^2\ogm\T\) + 



^/2\ogm\T\ 
M\T\ + 1 



o||r|v/2iog(|J]||r| 



< a„M4 {y/\og{\n\\T\) + 0.64 



The first line uses the fact that |r2||T| < ^Tq < + 1, the second line follows from the 

convenient fact that for any a > 1 we have 



v/iog(an) - Vloga < [2ay/\og 
and the third line follows from the assumption that |il||T| > 3. 



(56) 



For the multiscale sum of expected supremums in (55), we can apply Proposition 13 with 
a = ^^c7„M4|f]| l^j — Pj\ = every j > 0, and obtain 

max \N{p,) - iV(Q,)| < ^ E^"' [ J 'i^og{2'J+^M\T\ + 1) + 



< 



E2-^-f721og(2.+i|0||r|) + — 
.?>o \ 



V21og(2^-+i|l^||r|)_ 

2^+Mn\\T\ + 1 



n\\T\^2\og{2^+^n\\T\) 



j>0 



^ E 2"' ( y2bgPM + V2(i + l)log2 + 0.62 



<^(2V21og(|f^||T|)+4.42 



< cj„M4 0.25Vlog(|17||r|) + 0.4 , 



where the first line uses the fact that 2^+^\n\\T\ < #Lj < 2^+^\n\\T\ + 1, the second line 
uses (56), and the third line uses the assumption that > 3. Consequently, in light of 

sup \N{t)\ < a„M4 (l.25Vlog(|17||r|) + I.04) . (57) 



(55 ), we obtain 
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Removing the conditioning on {uJk}, we arrive at 
Esup|iV(T)| =E^,E„, sup|A^(r)| 



reT 



< an (l.25Vlog(|f^||T|) + 1.04 ) E^^M^ 

< an (l.25Vlog(|f^||T|) + 1.04 



\ k=l 



an (l.25y^\og{\n\\T\) + 1.04^ Wm|Jl|-i ^ |s(w)|2 dtj 



= CT„M2 (^1.25Vlog(|!^||T|) + 1.04 
< 2.25(T„M2Vlog(|rj||r|), 

where the third hne uses Jensen's inequahty, and the last hne follows from our assumption 
that |ri||T| > 3. This completes the proof of Theorem El 



4.2.2 Tail Bound (Proof of Theorem [s]) 

Recall that, conditioned on {cj^}, N{t) is a centered complex-valued Gaussian process. The 
following result, proved in Appendix [E] provides a sharp tail bound for the supremum of 
this random process. 

Lemma 14. Let {G{t), t G A} he a centered complex-valued Gaussian process. Define the 
weak variance as := sup^g^E Then the following holds for any A > 0; 

p|sup|G(t)| > Esup|G(t)| + a| < exp (^-^ 



To apply this bound to N{t) conditioned on {ujk}, notice that (52) directly implies u"^ 
M|. Therefore, using Lemma 14l we obtain the following for any A > 0: 



P„, <^ sup \N (t)| > E„,, sup |iV (r)| + A ^ < exp 



reT 



reT 



A^ 



2alMi J ■ 



(58) 



Now, recall (57) and note that due to our assumption that |r2||T| > 3, we have E„^ sup^g-p |-A^(t)| < 
2.25cr„M4Vlog(p|]r|). Combining this fact with ([58]), take A = cr„M4v^2 log (2/5) to get 



P„, I sup I (t)| > 2a„M4 max ^2.25 ^log (|0| |r|), V21og(2/5)) | 

< P„, |sup|A^(t)| > 2.25cT„M4Vlog(|^^||r|) + cT„M4V21og(2/5)) 

I reT 

< 6/2. 



(59) 



Now, we need only to show that M4 is small with high probability. For this, we will 



use the Bernstein inequality (Lemma 12). To apply the Bernstein inequality, take Vk 
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\s{cuk)\^ - \n\~^ \\s\\l. Notice that 

EVk = E\s{ujk)\'^-\n\-^ = [ \n\-^duj-\i^\-^ \\s\\l = ||s||2-|0r^ ||s||2 = 

Jci 



and also that 



\Vk\ < sup 



\s{uj)f - \n\-^ \\s 



< sup max ( |s(a;)p, ^||s||2) 

= max ^sup |s'(a;)|^, \^\~^ ||s||2 

= max ^M, \\s\\fj 
= M, 

where the second hne uses the convenient fact that |a — 6| < max (a, b) for any a,b >0, and 
the fifth hne uses the fact that ||s||2 < \^\ sup^^ |s~(a;)p. In addition, we know that 

EVi = E\s{u;k)\' - |0r' \\s\\t < ns{0Jk)\' = \n\-'\\s\\i 

where the first equahty follows from the convenient fact that E\V - El/p = E|yp - |IEFp 
for any random variable V. Therefore, we have = Yyk=i — "^1^1 ""^ll^lll ~ -^i • Now 
we can apply the Bernstein inequality to Ylk=i for any A > and obtain 

p{M|>™|0|-'ra^ + A} < exp(- ^^,;^^^^3 ). 

Suppose first that Mi > (M/3) v^log(2/5). Then take A = aMi^log(2/(5) for some a > 
to get 

P |m| > m\n\-' \\s\\l + aMi < exp ( a' Mf log{2/6)^ \ 

I ' IV &V/;/- 2M2 + 2aMMi Vlog(2/(5)/3 J 

g^Mf log(2/(5) \ 
2M2 + 2aM2 J 

^^"P(-2^^°S('/') 
<S/2, 

which is valid for o > 1 + \/3. 

Now suppose that Ml < (M/3)yiog(2/(5). Take A = aM\og{2/S). Then 

a2M2log2(2/(5) 



< exp 



< exp 

= exp 

<S/2, 



(2/9)M2 log(2/<5) + (2a/3)M2 log(2/5) 

a2log(2/J) \ 



2/9 + 2o/3y 
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which is vahd for a > 



So with probabihty at least 1 — 5/2 we will have 

M| < m\n\-^ ||s||2 + amax(Miv^log(2/(5),Mlog(2/(5)) , 



(60) 

for a > 1 + \/3. 

By the definition of Ml, note that m|r2|~"^ 11^112 > aAfi Y^log(2/(5) is equivalent to m|J7|~^ ||s~||2 > 
a^/rn\ / \\'s\\1^ y^log{2/6), which is equivalent to m > log(2/5). Also, m\n\-^ \\s\\l > 

oMlog(2/5) is equivalent to m > avM2\i}\log{2/6). Therefore, in order for the first term 
on the right hand side of (60) to be dominant we conveniently assume that 



m > a max 



log(2/(5). 



|4 ' 



where we used the fact that a > 1. Under this assumption, 

M| < 2m\n\~^ \\s\\l = 2M| 



(61) 
(62) 



with probability exceeding 1 — 5/2. Let £ denote the event specified in (62); clearly if (61) 
is met, then P {£} > 1 — 5/2. Now, using (59), we have 

P I sup \N (r)| > 2\/2f7„M2 max (2.25^/log{\n\\T\), V21og(2/5) 



reT 



< P <^ sup I iV(r) I > 2cr„M4max (2.25x/log (|f]||r|), V21og(2/5) 
UeT ^ 

= irT^M / Pnfe I sup I (r) I > 2a„M4 max (2.25 ^log ( | 1 1 T] ) , ^2 log(2/<5) 

6 f 

- 2pm y/^^^"'^" 
_ 5 

~ 2' 

where the third line above follows from the definition of conditional measure. 
On the other hand, for any n > 0, we have 



{uk} > dfj,{{ujk}) 



(63) 



reT 



P <^ sup |iV (r)| > ti ^ = P <^ sup \N (r)| > n 



< P { sup|iV(r)| > u 

.reT 



£: J> p {^:} + p <^ sup I iV(T) I >u 

I reT 



Taking u = 2V2cr„M2 max (2.25v^log (|J]||T|), y^21og(2/5)j and using ([63j), we finally ob- 
tain 



P <^ sup \N (t)| > 2\/2cr„M2 max ('2.25Vlog (|J]||r|), ^/2\og(2/5) 



< P <^ sup|iV(r)| > 2^/2o-„M2max (2.25Vlog (|J]||T|), v^21og(2/(5) 
I reT ^ 



which is valid when (61) is met and a > 1 + ^/J>. Setting a = 1 + \/?>, C3 = a^, and 
C4 = ^--^^^ ^ we complete the proof of Theorem [sj 
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A Proof of Lemma [9] 



Let E(j^ and E^/_ denote expectation with respect to {oj^} and {w^}, respectively. Then we 
can write 



Esup|y(r)| =E,,sup Y{t)-E^,Y'{t) 

T T " 

= E,,sup E,, [Y{t)-Y'{t) 

T 

<E^,E,, sup|y(r)-y'(r)| 

T 

= Esup|y(T) -y'(T)| 

T 

= Esup|Z(r)|. 



(y'(r) is zero mean) 

(independence; "^^'Y = Y) 
(Jensen's inequahty, sup|-| is convex) 
(iterated expectation) 



Finally, since Z{t) has the same distribution as Z'{t), Esup^ l'^(''")l = Esup^ \Z'{t) 



B Proof of Lemma 10 



Recall that, for every t £ T, Y{t) was symmetrized by creating an independent copy Y'{t) 
and defining Z{t) = Y{t) — Y'{t). Therefore, for any a, A > 0, the occurrence of the events 
{sup^ l^l"?")! > a + A} and {sup^ \Y^\ < a} imply that 

A < sup|y(r)| -sup|y'(r)| < sup(|y(r)| - \Y'{t)\) < sup|y(r) - Y'{t)\ =sup|Z(r)|. 

T T T T T 

Setting a = 2Esup^|y(r)| and recalling that Z{t) and Z'{t) have the same distribution, 
we conclude that 

p|sup|Z'(r)| > a| >p||sup|y(r)| > a + a| n jsup |y'(r)| <a|| 

= p|sup|y(r)| > 2Esup|y(r)| +a1 p|sup|y(r)| < 2Esup|y(r 



> -P<^ sup|y(r)| > 2Esup|y(r)| +AL 

2 [ T r J 

where the second line follows from the fact that Y and Y' are independent copies of the same 
random process, and the third line follows from applying the Markov inequality, which states 
that for any nonnegative random variable V and any c > 0, we have P {V > c} < c~^KV. 



C Proof of Lemma [TTI 



Hoeff ding's inequality for Rademacher sums [29l Prop. 6.11] states that if bi,b2, ■ ■ ■ ,bN are 
complex numbers and ei, 62, • • • , cat is a Rademacher series, then for every A > we have 



N 



k=l 



> A > < 2 exp 



(64) 
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Conditioned on {uj^} and {w^}, we can write 



Z'{t) = ^kak, for some at with \ak\ = A \s{ujk)f e^^^'^^-^o) - A |s(a;^)| e'^^J^^^^^o) 

k=l 



and therefore, 



k=l 



k=l 



<|A|2^2|?(^,)r + 2|?(a;^) 

k=l 

= 2\A\^Mi, 



where the second hne uses (33). Plugging into (64) yields (35) as desired. 
For the increment bound, conditioned on {cok} and {w^}, we can write 

m 

Z'{n)-Z'{T2) = Y,^kf3k, 

where 



k=l 



k=l 



k=l 



fc=i ^ 

m 

<8|A|2^|?(t.,)| 



sin ( -iVkin - T2) 



+ H^k)\ 



sin ( ^Wfc(r2 -n; 



fc=i 

m 



Wfe(ri - r2) 



< 



8i^|2^i?(^,)r 



fc=i 



2 



(n - T2) 



+ \si^k)\ 



2^^(7-2 -n) 



\n\ 



(t2 - n) 



The second line above follows from applying ( 33 ) and then ( 32 ) , the third line uses the fact 
that I sin(a)| < |a|, and the fourth line follows from the fact that is symmetric about the 
origin, i.e., that O = [— Wmax, "^max]- Plugging into (64) yields (36) as desired. 



D Proof of Proposition 13 



It follows from (41) that 



Emax|yi| = ^ P jmaxlFil > a| dA. 
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By breaking the integration interval at Aq := ay^2\og{KN), we can write 

Kmax\Vi\= I P <| max 11/^1 > A dA + / P <i max IFJ > A dA 



L * J JXo 

< d\+ I A^maxP{|yi| > A} d\ 

Jo Jxo * 



oo 

< Ao + / i^A^e-^'/2<x2 

Ac 

<Xo + KN. 

Aq 



a ^2 logjKN) + , , 

r y2iogpnv)r 



as claimed. The second line above uses the union bound, and the fourth line uses (34). 



E Proof of Lemma 14 



The proof essentially follows [22^, p. 134]. Fix ti, ^TV in ^ and form the vector g : = 

[G{ti),G{t2), ■■■,G(tN)f'" E with covariance matrix F = SH*. This vector has the 
same distribution as 'Eh, where h S is the standard Gaussian vector, whose entries 
are i.i.d. zero-mean Gaussian random variables with unit variance. This is because EH/i = 
E - Eh = On = and E{{Eh) {Eh)*} = EE {hh'^} E* = EE* = T = E{gg*}. Here On 
denotes the x 1 zero vector. Now consider the function F{x) : — t- M defined as 
F{x) := maxi<j<Ar = ||Hx||q^. Let xi, X2 G M^. Now we can write 



-F{x2)\ = lllHxill^ - ||Sx2||ool 

< ||H(xi -X2)|L 

< l|S|loo,2 \\X1-X2\\2- 

The second line follows from the triangle inequality, and in the third line, 2 denotes the 

operator norm of E from equipped with the /2-iiorm to equipped with the /oo-norm. 
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Let B2 denote the unit Z2-t)all in M^. Note that we have 

/ \ 2 



|2 

loo,2 



sup max 
sup ^max J (Hx)J^ 



N l<i<N 



max sup 



N 



AT 

|2 



= max > 

l<j<Af 
- - J=l 

= max (HH*). . 

= max Ti^i 

= max E|G(ti)|^ 

i 

The third hne is a consequence of the Cauchy-Schwartz inequahty for complex-valued num- 
bers. Therefore, F[-) is a z^A^-Lipschitz function of its argument (which is a standard Gaus- 
sian vector here). So we can invoke, for example, l22i eq. (2.35)] to get 



Pr |F (/i) > EF(/i) A| = Pr <^ max I (H/i), I > E max I (H/i),- 1 + A 

Pr<' max \Giti)\ >E max \G{ti)\ + \ 

\l<i<N l<i<N 

< exp 



Now we can apply monotone convergence to the above inequality; as — )• 00, we have 
~^ come to the desired result. 
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